Importance of chirality and reduced flexibility of protein side chains: 
A study with square and tetrahedral lattice models 
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Side chains of amino acid residues are the determining factor that distinguish proteins from 
other unstable chain polymers. In simple models they are often represented implicitly {e.g., by 
spin-states) or simplified as one atom. Here we study side chain effects using two-dimensional 
square lattice and three-dimensional tetrahedral lattice models, with explicitly constructed side 
chains formed by two atoms of different chirality and flexibility. We distinguish effects due to 
chirality and effects due to side chain flexibilities, since residues in proteins are L-residues, and 
their side chains adopt different rotameric states. For short chains, we enumerate exhaustively 
all possible conformations. For long chains, we sample effectively rare events such as compact 
conformations and obtain complete pictures of ensemble properties of conformations of these models 
at all compactness region. This is made possible by using sequential Monte Carlo techniques based 
on chain growth method. Our results show that both chirality and reduced side chain ffexibility 
lower the folding entropy significantly for globally compact conformations, suggesting that they are 
important properties of residues to ensure fast folding and stable native structure. This corresponds 
well with our finding that natural amino acid residues have reduced effective flexibility, as evidenced 
by statistical analysis of rotamer libraries and side chain rotatable bonds. We further develop a 
method calculating the exact side-chain entropy for a given back bone structure. We show that 
simple rotamer counting underestimates side chain entropy significantly for both extended and 
near maximally compact conformations. We find that side chain entropy does not always correlate 
well with main chain packing. With explicit side chains, extended backbones does not have the 
largest side chain entropy. Among compact backbones with maximum side chain entropy, helical 
structures emerges as the dominating configurations. Our results suggest that side chain entropy 
may be an important factor contributing to the formation of alpha helices for compact conformations. 

Keywords: chirality, fiexibility, packing, side chain entropy, sequential Monte Carlo. 



I. INTRODUCTION 

Side chains of amino acid residues are the deter- 
mining factor that distinguish proteins from other 
unstable chain polymers. Their arrangement along 
primary sequence dictates the native structure of 
proteins. Side chains are also responsible for much 
of the complexity of protein structur9i*2i2i^. They 
pack tightly, but also leave space to form voids and 
pockets§i2iS. The effects of simplified side chain were 
studied in details for two dimensional square lat- 
tice and three dimensional cubic lattice models in 
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referencai. Such studies of simplified models played 
important roles in elucidating the principles of pro- 
tein foldingi, because these models allow enumera- 
tion of all feasible conformations and calculation of 
exact entropy for short chain molecules. They are 
also amenable to detailed sampling for longer chain 
models. However, the effects of side chains are still 
not fully understood. Several studies on side chain 
effects rely on implicit models or assign different spin 
states to each monomer to mimic the internal de- 
grees of freedom of side chainsiSiii. It is not clear 
how realistic these model are without explicit side 
chains. In studies where side chains are modeled 
explicitly, they are simplified: only one atom is at- 
tached to the main chain monomeri. Since there is 
no internal degree of freedom for side chains of one 
atom, x-angles and rotamirc states of side chainsiS 
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cannot be studied. 

In this study, we introduce more realistic side 
chain models. We make the distinction of two differ- 
ent side chain effects that have not been investigated 
previously. We first study the chirality effects. Chi- 
rality effects at Ca atom of a residue arises because 
the four atoms bonded with Ca are differenlii^. Side 
chain atom Cp can be attached to different positions 
of Ca relative to other atoms (C, iV, and H atoms). 
In Nature, all amino acid residues with side chains 
are of the L configuration instead of the D configu- 
ration, i.e., the position of Cp in relationship to C 
and N atoms are all in a unique chiral state. The 
origin of this bias is unclear and remains a puzzle in 
the studies of the origin of lifeilii^. We also study 
the flexibility effects. Flexibility effects arise because 
additional atoms beyond Cp can rotate around a sin- 
gle side chain chemical bond, regardless of the chi- 
ral state of Ca^-'^''. These two effects are different: 
There is a large energetic barrier for change of chiral 
state, which often involves the breaking of a chemi- 
cal bond. In contrast, rotation along a single bond 
is relatively easy. 

We use lattice models to study the effects of both 
chirality and flexibility. We introduce chirality mod- 
els for two-dimensional square lattice and three di- 
mensional tetrahedral lattice polymers. To model 
side chain flexibility, we use explicitly side chains 
consisting of two atoms, which enable the modeling 
of rotational degree of freedom of side chains. Be- 
cause this leads to significant increases of the size of 
conformational space, it is difficult to characterize 
accurately ensemble properties of compact confor- 
mations of polymers. We use the techniques of se- 
quential Monte Carlo importance sampling and re- 
sampling to generate properly weighted samples of 
rare events, such as long chain conformations with 
maximum compactness. 

We examine the distribution of all geometrically 
feasible conformations of self-avoiding walks on lat- 
tice with side chains of different chirality and flexibil- 
ity. We focus on their packing properties and their 
conformational entropy. Folding into a well defined 
native structure is accompanied with large reduction 
in conformational entropy. We explore how entropy 
of folding is affected by chirality and flexibility, and 
how it relates to the compactness of chain polymers 
with side chains. Because the absolute number of 
compact conformation changes dramatically after in- 
corporation of chirality and side chain flexibility, it 
is not obvious whether these factors helps or hin- 
ders protein folding. Our results indicate that chiral 
molecules have lower entropy of folding than achiral 



models. Models with less side chain flexibility also 
have significant lower entropy of folding than models 
with more flexible side chain. 

Side chain entropy is important for protein fold- 
ing and its estimation is the subject of several 
studiesiiiSiiSiSSiSi. To calculate side chain entropy 
precisely for our model polymers, we introduce an al- 
gorithm for counting the exact number of side chain 
conformations. It is based on the observation of dis- 
connected sets in the conflict graph of side chain cor- 
relations. In comparison, we find rotamer countingSS 
significantly overestimates side chain entropy, and 
the difference is more pronounced in most extended 
as well as in protein-like near compact regions of 
main chain structures. 

In addition, we revisit two models of protein pack- 
ing, namely, the jigsaw puzzle model and the nuts 
and bolts model. We show that packing of chain 
polymers with chiral side chains included is more 
like nuts and bolts than jigsaw puzzles. 

The results presented here are in agreement with 
the chiral nature of L-amino acid residues found 
in natural proteins and an analysis of flexibility of 
residues in real proteins. They suggest that both 
chirality and restriction in flexibility make impor- 
tant contributions to protein folding. Our presen- 
tation is organized as follows: We first introduce 
side chain models for chirality and flexibility ef- 
fects in two-dimensional square lattice and three- 
dimensional tetrahedral lattice. This is followed by 
a description of the parameters used in our study 
and the algorithm for counting side chain confor- 
mations. The results of chirality and flexibility on 
conformational entropy by enumeration and by se- 
quential Monte Carlo sampling are then presented. 
We then compare rotamer counting and the exact 
method developed here for calculating side chain 
conformational entropy. We conclude with remarks 
and discussion. 



II. MODELS AND METHODS 

a. Lattice side chain models. For two- 
dimensional square lattice and three-dimensional 
tetrahedral lattice models, a side chain consists 
of one or two atoms attached to each main chain 
monomer. There are no side chains for the two 
terminal monomers following referencei (Figure^. 
For three-dimensional models, we use tetrahedral 
lattice instead of cubic lattice. The coordination 
and bond connection of a tetrahedral unit are very 
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FIG. 1: Lattice side chain models of 6-mers (a) on square 
lattice with side chain of size one and (b) on tetrahedral 
lattice with side chain of size one. Filled circles repre- 
sent main chain monomers and empty circles represent side 
chain atoms. In these examples, side chains have one atom. 
Arrows pointing to spatial contacts between non-bonded 
atoms. 



similar to carbon atoms with four chemical bonds, 
which is the most abundant element in proteins. 
Both chirality and flexibility can be modeled 
effectively using tetrahedral lattice. In addition, 
tetrahedral lattice has the advantage that real 
protein structures can be well approximated22iS4. 

b. Models for chirality. A molecule that is dis- 
tinct from its mirror image is a chiral molecule. The 
idea of "chirality" in molecule goes back to Pasteur, 
who observed in 1848 that crystals of tartaric acid 
rotated polarized light in different directions, either 
to the right (D for "dextro") or left (L for "/ewo")2^. 
Here we consider chirality due to different attach- 
ments of non-identical atoms to the Cn atom. 




i-1 i+1 



empty 

FIG. 2: Assignment of chirality for models in planar square 
lattice and three dimensional tetrahedral lattice, (a) A two- 
dimensional L-residue in square lattice. From the backbone 
monomer (filled circle) of residue i + 1 to the side chain 
atom [sci, empty circle) of residue i, then to the empty 
site, we turn counter clockwise, (b). A three-dimensional 
D residue in tetrahedral lattice. Taking the view point 
along the vector pointing from the backbone monomer of 
residue i to the empty site, the backbone atoms of residues 
i + 1,1 — 1, and the side chain atom (sCi) of residue i are 
arranged in a clockwise fashion. 



We first introduce chirality for two-dimensional 
lattice model. Planar chirality arises if a two- 
dimensional molecule and its reflection (mirror im- 
age about a line) cannot be superimposed. For a 
chiral residue, the placement of side chain atoms is 
restricted. The chirality of a residue i is determined 
by the relative orientation of its attached side chain 
atom and the preceding and succeeding main chain 
monomers of residues i — 1 and i + 1 (Figure |3l) . 
For a chiral residue i, if we start from the main 
chain monomer of the succeeding residue i + 1 and 
go through the side chain atom of residue i (sCi) 
to the unoccupied site (unoccupied by the two back- 
bone monomers of residues i — l,i + l, and side chain 
SCi), the chirality of residue i is L if we turn counter 
clockwise, and D if we turn clockwise. 

For three dimensional tetrahedral lattice, chiral- 
ity of a residue can be defined realistically following 
that of the Cq in amino acids (Figurel^h). We take a 
view point along the vector pointing from the back- 
bone monomer of i to the empty site unoccupied by 
backbone monomers of residues i — 1, i + 1, and side 
chain atom of residue i. If the backbone monomers 
i + 1 , i — 1 , and the side chain of residue i are ar- 
ranged counter clockwise, the chirality of residue i 
is L. If they are arranged clockwise, the chirality 
is D. For achiral models in both square lattice and 
tetrahedral lattice, there is no restriction for the al- 
lowed positions for the first side chain atom and it 
can take any of the two available sites unoccupied by 
backbone monomers of z — 1, i, and i + 1-th residues. 

We study both chiral molecules and achiral 
molecules in this work. In a chiral molecule, the 
first atom of all side chains follow strictly one fixed 
chirality (either D or L, we use D for this study). In 
an achiral molecule, there is no restriction and the 
first atom of a side chain can take any unoccupied 
reachable site. 

c. Models for side chain flexibility. Regardless 
whether a residue is chiral or achiral, it is possible 
to have a flexible side chain if the side chain consists 
of two or more atoms. Because square lattice and 
tetrahedral lattice both have coordination number 
of four, there are at most three possible sites avail- 
able when attaching a new side chain atom. This 
models well the xi angels of protein side chains, as 
they often can be grouped into mainly three clus- 
ters: t, g^, and , which stand for trans, gauche 
positive , and gauche negative, respectivelji2&. De- 
pending on whether any of these sites are forbidden, 
a side chain atom in tetrahedral lattice may have 1, 
2, or 3 allowed positions (Figure OJ. In model Mi, 
the second side chain atom can only be placed at 
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FIG. 3: Rotameric positions of Mi, M2, and A/3 side 
chain flexibility model. Facing the vector pointing from 
backbone atom i to its first side chain atom, AIi position 
for the second side chain atom is located in the opposite 
direction to the vector connecting backbone atom i to back- 
bone atom M2 model contains an additional site for 
the second side chain atom, which is located in the oppo- 
site direction to the vector connecting backbone atom i to 
backbone atom i — 1. For AI3 model, the second side chain 
atom can occupy any of the three reachable sites. 

one fixed position. In model M2, the second side 
chain atom can be placed at one additional possible 
position, and in model M3, the second side chain 
atom can be placed at any of the three possible po- 
sitions. Facing the vector pointing from backbone 
atom i to its first side chain atom, Mi position for 
the second side chain atom is located in the opposite 
direction to the vector connecting backbone atom i 
to backbone atom i -I- 1. M2 model contains an ad- 
ditional site for the second side chain atom, which 
is the immediate neighbor of Mi in counterclockwise 
direction. For M3 model, the second side chain atom 
can occupy any of the three reachable sites. 

d. Contact and Compactness. We focus on 
protein-like compact conformations. Two non- 
bonded monomer (backbone or side chain) Ui and nj 
are in topological contact if they are spatial neigh- 
bors. Two main chain monomers are in contact only 
if they are not sequential neighbor ( i ^ j ± 1, see 
arrows in Figure 

The parameter measuring compactness p of a con- 
formation is defined as the ratio between its number 
of topological contacts and the maximum number of 
contact attainable for a particular sequence of given 
chain length^l: 

p = — - — , where < p < 1. 

imax 

e. Entropy and excess entropy. We are inter- 
ested in the effects of different models of side chain 
chirality and flexibility on the conformational space 



of chain polymers. We study only homopolymers 
and do not investigate the relationship between se- 
quence and conformation. 

Since homopolymers do not fold into a unique sta- 
ble ground state conformation, we calculate the en- 
tropy for homopolymers to adopt conformations at 
a specific compactness value p. We define entropy 
S{p) for conformations with compactness p as: 

S{p) = ks ln7i(p), 

where ks is the Boltzmann constant, n{p) is the 
number of conformations with compactness p. Simi- 
larly, side chain entropy SsciB) is defined for a fixed 
backbone conformation B as: 

SsciB)=kB \nn,ciB), 

where nsc{B) is the number of all self-avoiding side 
chain arrangements for the fixed backbone confor- 
mation B. The overall entropy S for all conforma- 
tions is given by: 

S ^ kB In^n(pi) = fcs iTL^UsciBj). 

i j 

The change AS" in the conformational entropy be- 
tween folded state (F) and unfolded state (U) is 
given by: 

AS* = Sp — Sjj. 

For lattice models used in this study, folded state 
is defined as conformations with compactness p = 
Pmax = 1- Unfolded states correspond to all confor- 
mations with compactness p < 1. We have: 

^S{pmax) = S{p„iax) ~ S{p < 1). 

Since conformations with pmax constitute a very 
small proportion among all conformations: S{p < 
1) « S, we have: 

^S{p^ax) ~ S[praax)-S = In = k b In (Pmax) , 

l^tn[Pi) 

where uj{pmax) is the fraction of maximum compact 
conformations. For convenience, we define folding 
entropy ASf of the maximum compact conforma- 
tions as the absolute value of the above entropy 
change: 

ASf = \AS{pmax)\ = -kB^nUJ^Pmax)- 

We define entropic change AS{p) at other compact- 
ness as: 

AS{p) - \AS{p)\ = -kB\nu{p). 
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To compare folding entropies of models with dif- 
ferent chirality and flexibility, we follow reference^ 
and define the excess entropy ESafi for model a 
when compared to model b as: 



ESa,b - A5/(a) - A5/(5) = -fcs In 



{Pmax ) 



where uja{pmax) and ujb{pmax) are the fractions of 
maximum compact conformations for model a and 
model 6, respectively. 

/. Radius of gyration Rg. Radius of gyration 
{Rg) is a parameter frequently used to measure the 
global compactness of a conformation. For a set of n 
atoms, Rg is the root-mean-square distance of posi- 
tion Xi £ of each atom i to their geometric center 



Rg - {x, - xfin) 



1/2 



Rg fluctuates 



For globular proteins, the value of 
but can be predicted with reasonable accuracy from 
the number of residues by the relationship Rg « 
2.2A^^/'^, which describes accurately globally com- 
pact proteinsS^. 

g. Sequential Monte Carlo importance sampling. 
In this study, we need to estimate properties of rare 
events, namely, properties of conformations with 
maximum number of contacts pmax, e.g., the frac- 
tion of conformations with p = Pmax- Estimating 
properties of rare events is difficult, because find- 
ing such conformations is challenging when more ex- 
tended conformations dominate in the whole popu- 
lation of all geometrically feasible self-avoiding walks 
with side chains. We adopt the same sequential 
Monte Carlo strategy for sampling as that of a recent 
three dimensional off-lattice study, where thousands 
of polymers of length 2,000 at very high compact- 
ness values were successfully generatedSS. Sequential 
Monte Carlo is an effective strategy based on chain 
growth for sampling high dimensional spaca^Si^. 
The details of studying lattice models using this 
technique have been described elsewhere22i2^. It was 
shown previously that sequential Monte Carlo can 
give accurate estimation of ensemble properties of 
lattice conformations, as verified by comparison with 
results obtained from exhaustive enumeration^^-'^^. 

Once a sample conformation is generated, we need 
to find out whether it is maximally compact. For two 
dimensional square lattice models, the upper bound 
of the number of contacts t* for polymers in which 
all beads (including main chain monomers and side 



FIG. 4: A maximum compact conformation on square lat- 
tice for achiral model of side chain size one. Solid circles 
are backbone monomers, and empty circles are side chain 
atoms. The main chain length is 50. 



chain atoms) are connected can be calculated. For 
any polymers with N beads, t* is: 

t* = N - 2m, for m'^ < N < m{m + 1) 
t* = N -1- 2m, for m{m + 1) < N < {m + 1)^, 

(1) 

where to is a positive integer—. It is easy to verify 
that this bound is tight for polymers without side 
chains and gives the maximum number of contact 

tjnax ■ 

Finding the maximum number of contacts for 
models with side chain is more difhcult, since no 
closed-form answers are known for various side chain 
models studied here. The compactness p is therefore 
difficult to calculate for long chain polymers. With 
the introduction of side chains, it is possible that 
the maximum compact conformations may not take 
t* as tmax due to the requirement of side chain con- 
nectivity and self avoidance. For two dimensional 
square lattice models with side chain of size one, we 
find from exhaustive enumeration that there are con- 
formations with maximum contact of t* for chains 
up to length iV = 18 in achiral model. No con- 
formations with t* contacts exist for chiral model. 
For longer chains, we generate samples of conforma- 
tions using sequential Monte Carlo for length up to 
N = 100. We found that for achiral models, there 
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exist sampled conformations with t* contacts at ev- 
ery length from 19 to 100. This suggests that it 
is likely that achiral models with side chain of side 
one have t* = tmax at length N > 2. It also in- 
dicates that this sampling strategy is effective and 
our method can give correct estimation of the max- 
imum number of contact tmax and compactness p 
for two dimensional achiral model. An example of 
maximum compact conformation for achiral model 
of length 50 on square lattice obtained using sequen- 
tial Monte Carlo is shown in Figure 0] with TV = 98 
and TO = 9. 

Verified successful results in two-dimensional 
models are helpful in assessing the effectiveness of 
sampling for three-dimensional models. Both tetra- 
hedral lattice models and square lattice models have 
the same coordination number of four. In addition, 
conformations from chiral model is a subset of that 
of achiral model. We postulate that our method can 
give satisfactory estimation of Pmax for tetrahedral 
models used in this study. 

h. Exact calculation of side chain entropy. Ro- 
tamer counting is a widely used method to estimate 
side chain entropy of residues in proteins when the 
backbone structure is givenSS. The idea is to count 
available rotameric states for each monomer inde- 
pendently, and estimate the total number of states 
by multiplication. This approach would be accu- 
rate if all possible placement of side chains at differ- 
ent residues are independent. The problem is that 
not all combinations of rotameric states for residues 
along the main chain are self-avoiding. Hence this 
method inherently overestimate conformational en- 
tropy. The extent of the over-estimation and its ef- 
fect in assessing protein folding entropy is unknown. 

Calculating the exact number of all valid side 
chain conformations for a given main chain struc- 
ture is challenging, since this requires explicit enu- 
meration of all possible spatial arrangement of side 
chains. Here we introduce an algorithm for counting 
side chain conformations based on the divide-and- 
conquer paradigm. 

For a fixed backbone structure, if the placement 
of side chain atoms of a residue affects the allowed 
positions of side-chain atoms of another residue, 
we say there is a conflict for side chains of these 
two residues. We can construct a conflict graph 
G — {V,E), where V is the set of residues, and 
E is the set of edges representing conflicts between 
pairs of residues. All residues in a molecule can be 
grouped into to individual sets, each representing 
a disconnected component of the conflict graph G. 
When two sets are disconnected, side-chain place- 
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FIG. 5: An illustration of calculation of side chain entropy, 
(a) An extended conformation. Filled circles are main chain 
monomers, gray circles are first side chain atoms, and empty 
circles are positions that could be occupied by side chain 
atoms of either of two different residues, (b) The conflict 
graph of the conformation. There are two disconnected 
components in the conflict graph, one formed by residues 
3, 5, and 7, and another by residues 2, 4, 6, and 8. The 
latter can be further divided into two smaller components 
by cutting at position a. 



ment of residues in one set does not affect the place- 
ment of side-chains of residues in another set. The 
disconnected components in graph G can be identi- 
fied using depth-first-searchSi. We can then calcu- 
late the number of different side-chain arrangement 
Hi for each set i by enumeration. The total num- 
ber N of side chain conformations of all residues is 
obtained by multiplication: N = Y[i=i ^ sim- 
ple example is shown in Figure |5^ and Figure [Sja, 
where a graph is constructed for an extended back- 
bone structure. The residues can be decomposed 
into two independent components, one formed by 
residues with side chains above the main chain, and 
another formed by residues with side chains below 
the main chain. 

i. Helix content on tetrahedral lattice. For a 
fragment of four consecutive monomers (from i to 
i -|- 3), there are three possible conformations on a 
tetrahedral lattice: left turn fragment, right turn 
fragment, and straight fragment (see Figure I^Jl. For 
a fragment of five consecutive monomers, the 4- 
prefix fragment {i to i + 3) and the 4-suffix fragment 
(i -I- 1 to i -I- 4) can have any of the above three con- 
formations. If the 4-prefix and 4-suffix fragments are 
of all left turns or all right turns, this five monomer 
fragment is defined as a helix. Helices with all left 
turns are defined as left-hand helices, and helices 
with all right turns are defined as right-hand helix. 
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fragment backbone conformations. compactness compactness 



We include both types of helices and their mixture 
when calculating helix content of a backbone confor- 
mation. Specifically, the helix content /i of a back- 
bone conformation of length is: 

N-5 

h=Y.I{i)/{N-A), 

where = 1 if the fragment of residues from i to 
I -|- 4 is a helix by the above definition, and I(«) — 
otherwise. 



III. RESULTS 
A. Exact conformational space by enumeration 
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FIG. 7: Distribution of conformations over compactness for 
models of side chain size one (a) polymer chains of length 
16 on two-dimensional square lattice (2Zal and 2Zcl) ob- 
tained by exhaustive enumeration, (b) polymer chains of 
length 16 on three-dimensional tetrahedral lattice (3Tal 
and 3Tcl) obtained by exhaustive enumeration, (c) poly- 
mer chains of length 30 on two-dimensional square lattice 
obtained by sequential Monte Carlo, and (d) polymer chains 
of length 30 on three-dimensional tetrahedral lattice ob- 
tained by sequential Monte Carlo. 



For short polymers with side chains, we obtain a 
complete picture of the ensemble properties of con- 
formations by exhaustive enumeration. Tabled lists 
the total number of conformations of different side 
chain models obtained from exhaustive enumeration. 
For longer polymers, the full conformational space 
cannot be enumerated, and it is necessary to use 
sequential Monte Carlo sampling to generate prop- 
erly weighted samples from the uniform distribution 
of all geometrically feasible self-avoiding walks with 
various types of side chains. 



B. Effects of side chain chirality 



J. Distribution of conformations and folding en- 
tropy. How does the introduction of chirality affect 
the distribution of conformations and the entropy 
of folding? We first calculate the distributions of 
conformations over compactness p for a given main 
chain length N. The fraction f{p) of conformations 
with compactness p is: 



fip) 



EMpY 



where w{p) is the number of conformations found 
with compactness p. 

The distributions of enumerated conformations of 
chiral and achiral polymers at length 16 on two- 
dimensional square lattice and three-dimensional 
tetrahedral lattice are shown in Figures [7^ and [7}p, 
respectively. The distribution of conformations at 
length 30 estimated by sequential Monte Carlo for 
both square and tetrahedral lattice are shown in Fig- 
ures [Tt and [7|i, respectively. The inserts provide 
details of conformations in compact region. Distri- 
butions of less compact conformations may be use- 
ful for modeling proteins in unfolded state. Results 
from enumeration and sampling show similar pat- 
terns. In both two and three dimensional space, chi- 
ral and achiral polymers have low average compact- 
ness when the only interaction between residues is 
due to excluded volume, as in good solvent. How- 
ever, the distributions of conformations of these two 
chirality models are clearly different. Chiral mod- 
els have overall more compact conformations than 
achiral models. 

Since proteins are highly compact, we consider 
conformations in high compactness region, especially 
in the region where p — 1- We calculate the folding 
entropy ASf for the ensemble of conformations with 
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TABLE I: Number of conformations of a n-polymer by enumeration for different side-chain models. Here "2Z" stands for 
two-dimensional square lattice, "3T" for three-dimensional cubic lattice, "c" for chiral models, "a" for achiral models, "1" 
and "2" for side-chain models of 1 and 2 atoms, respectively, "M1-M3" for specific models of side-chain flexibility, where 
the second side chain atom can have 1-3 allowed positions, respectively. Specifically, we have: 2Zal: square lattice achiral 
model with side chain size of one, 2Zcl: square lattice chiral model with side chain size of one, 3Tal: tetrahedral lattice 
achiral model with side chain size of one, 3Tcl: tetrahedral lattice chiral model with side chain size of one, 3Tc2.Ml-M3: 
tetrahedral lattice chiral model with side chain size of two and flexibility models of M1-M3, respectively. 
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maximum compactness p = 1 and entropic change 
AS{p) at other compactness region. Figures |Ht and 
IHb show exact ASf and AS{p) for two and three di- 
mensional polymers of length 16 calculated by enu- 
meration. 

For chiral models, the change in entropy during 
folding to conformations of maximum compactness 
is much smaller than that of achiral models. For chi- 
ral and achiral conformations on tetrahedral lattice 
with side chain size 1 (3Tcl and 3Tal in Table D), 
the fraction of maximum compact conformations is 
much higher for chiral molecules (1.8 x 10~^) than 
for achiral molecules (4.96 x 10~^) at length 16. Chi- 
rality clearly favors compact conformations, despite 
the fact that the absolute number of conformations 
of maximum compactness is much smaller for chiral 
model (11 conformations) than for achiral models 
(144 conformations). 

k. Excess folding entropy due to chirality. To 
examine the differences of folding entropy for poly- 
mers under two different chirality models, we cal- 
culate excess entropy of folding ESc,a{p) of chiral 



model over achiral model for conformations at max- 
imum compactness p = 1.0. The scaling relation- 
ships of ESc,a{i-0) with polymer backbone length 
N are shown in Figures and for square lattice 
model and tetrahedral lattice model, respectively. It 
provides information on whether and how the effects 
of chirality changes with chain length. 

Exact ESc.ai^-0) obtained by enumeration in 
square lattice up to = 18 fluctuates with chain 
length (Figure unfilled circles). In tetrahedral 
lattice, ESc.a increases with N to 8.2 at length N = 
16 (Figure 133, unfilled circles). This trend becomes 
clearer in results obtained by sequential Monte Carlo 
for conformations up to length = 50 (Figure IHt))- 
When chain length increases, the excess entropy in- 
creases linearly. This suggests that the effect of 
chirality on entropy of folding increases with chain 
length in tetrahedral lattice. For tetrahedral lattice, 
this relationship can be characterized by a linear re- 
gression (i?2 = 0.98) with ESc,a{^.0,N) ^ aN + b, 
with a = 0.75 ± 0.06, b = -4.7 '± 2.4. The effects of 
chirality in increasing the fraction of compact chains 
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FIG. 8: Folding entropy for different models of chirality 
with side chain size of one. (a) Exact folding entropy 
ASf at maximum compactness and entropic change AS{p) 
at other compactness regions p for polymers of length 16 
calculated by exhaustive enumeration on two-dimensional 
square lattice, (b) Exact folding entropy ASf and entropic 
change AS(p) for polymers of length 16 calculated by ex- 
haustive enumeration on three-dimensional tetrahedral lat- 
tice, (c) Estimated ASf and AS{p) for polymers of length 
30 calculated by sequential Monte Carlo on two-dimensional 
square lattice, and (d) on three-dimensional tetrahedral lat- 
tice. 



become more pronounced as chain length increases. 
In square lattice the effect of chirality to excess fold- 
ing entropy also increases with chain length, but the 
trend is not as clear as that in tetrahedral lattice. 



C. Effects of side chain flexibility 

Because natural amino acid residues are three di- 
mensional chiral molecules, we describe only results 
on flexibility effects using chiral models on three- 
dimensional tetrahedral lattice. A benefit from 
studying chiral model is that the conformational 
space of side chain is greatly reduced, and the folding 
entropy can be studied for longer polymers. We omit 
results on two dimensional square lattice, which are 
similar to that of tetrahedral lattice shown here. 

I. Distribution of conformations and folding en- 
tropy. To study the effect of side chain flexibility, 
we first examine the exact distributions of confor- 



FIG. 9: Excess folding entropy due to chirality for maxi- 
mum compact conformations calculated for models of side 
chain size one, (a) on two-dimensional square lattice, and 
(b) on three-dimensional tetrahedral lattice. Unfilled circles 
are excess folding entropy calculated by exact enumeration. 
Filled circles are excess folding entropy estimated by se- 
quential Monte Carlo. 



mations obtained by enumeration for polymers of 
length = 12. These are obtained for three differ- 
ent models Mi,M2, and of different side chain 
flexibility, where the second side-chain atom can 
have 1, 2, and 3 allowed positions, respectively. The 
distributions of conformations of the three models 
show that Ml has much higher average compact- 
ness compared to the other two models (TigureirUkl. 
That is, less flexible side chains are more likely to 
form compact conformations. Mi model also has 
the lowest folding entropy for compact conforma- 
tions (Figure llUb l . Model M2 and M3 have simi- 
lar distribution, with M2 slightly more compact on 
average. 

For polymers of chain length N = 30, the dis- 
tributions of conformations estimated by sequential 
Monte Carlo show a similar pattern. Conformations 
from model Mi on average are more compact (Fig- 
ure \lOk ) : the largest number of conformations are 
found around p « 0.42 compared to p ~ 0.34 for 
model M2 and model M3. For compact conforma- 
tions {e.g., p > 0.8), the entropic change is also much 
smaller for Mi model of inflexible side chains. This 
suggests that there is a significant decrease in folding 
entropy when side chain loses its flexibilities. 

m. Excess folding entropy due to side chain flex- 
ibility. The excess entropy of folding i?Mi,M3(l-0) 
of model Mi compared to model M3 for maxi- 
mum compact conformations is shown in Fig 1111 
It can be characterized by a linear regression 
£^5'Afi.Af3(1.0,iV) = aN + b, with a = 0.27 ± 0.03, 
b = 4.75 ± 1.0, and i?^ = 0.90. These results sug- 
gest that inflexibility of side chain plays an impor- 
tant role for obtaining compact conformations. The 
effects of inflexibility in increasing the fraction of 
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FIG. 10: Distribution of conformations and folding entropy 
over compactness for models with different side chain flexi- 
bility. Exact distribution of (a) conformations and (b) fold- 
ing entropy for chains of length 12 obtained by exhaustive 
enumeration, and estimated distribution of conformations 
(c) and folding entropy (d) for chains of length 30 obtained 
by sampling. 



compact chains become more pronounced as chain 
length increases. 



D. Effects of side chains: packing, entropy, and 
secondary structure. 



n. Jigsaw puzzle or nuts and bolts? Two differ- 
ing views on the effects of side chains can be summa- 
rized by the model of jigsaw puzzle and the model 
of nuts and bolts (Fig I12|l . This comparison was 
studied in details in the seminal work of reference- , 
where homopolymers of side chain of size one are 
studied. According to the nuts and bolts model, 
a small expansion in the volume of compact native 
protein leads to a large increase in side-chain en- 
tropy. That is, side-chain entropy increases sharply 
as main chain becomes less packed than native state. 
According to the jigsaw puzzle model, a small expan- 
sion in volume does not lead to significant change in 
side chain entropy when the molecule is compact. 
In a model supporting the jigsaw puzzle mode, it 
is estimated that a 25% expansion in volume rel- 
ative to the native core volume is required before 
a sudden unfreezing of core side-chain rotameric de- 
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FIG. 11: Excess entropy of folding for conformations from 
A/3 model over conformations from Mi model estimated 
by sequential Monte Carlo. 
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FIG. 12: A schematic comparison shows the qualitative dif- 
ference in the dependencies of side-chain entropy on chain 
density in (a) a Jigsaw puzzle model for side-chain packing, 
in which a side chain freezing effect occurs at near compact 
region; and (b) a nuts and bolts model in which main chain 
and side chain degrees of freedom are linked. (Adapted 
from Fig 13 ir*-) 



grees of freedom incurs a sharp increase in entropjiS. 
In a model supporting the nuts and bolts model, a 
small expansion in volume from the compact native 
state produces a steep increase in side-chain rota- 
tional entropjii. The increase in side-chain degrees 
of freedom is linked to the increase in main-chain de- 
grees of freedom.^ . In this study, the size of the side 
chain is 1. For real proteins, except glycine and ala- 
nine, all other amino acid residues have more than 
one heavy atom in their side chains. What effects do 
side chains of larger sizes have on side chain packing? 
With the method of exact computation of side chain 
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entropy, we revisit this problem and examine the 
packing of chiral polymers with side chains formed 
by two atoms {M3 model). Following reference^, we 
use the radius of gyration Rg of backbone monomers 
to measure main chain packing density. 

We examine the distribution of side chain entropy 
Ssc over the full range of main chain packing density 
measured by Rg. Exact calculation of side chain en- 
tropy for each of the exhaustively enumerated main 
chain conformation of length 10 shows that side 
chain entropy does not always correlate well with 
main chain packing density (Figure 113b ) . Confor- 
mations with most extended main chain structures 
{Rg ~ 2.2 — 2.4) are not those with maximum side 
chain entropy, and many compact conformations 
{Rg « 1.4) have very large side chain entropy. 

We then use sequential Monte Carlo to generate 
longer main chain structures up to length 30, and 
calculate exactly the side chain entropy for each of 
the sampled main chain structures. We assess the 
correlation of side chain entropy and main chain 
backbone packing by calculating the average side 
chain entropy for conformations whose backbone Rg 
value falls into different intervals. As shown in Fig- 
ure EId, although some of the compact main chain 
structures of length 30 have very small Rg, they can 
still have substantial side chain entropy. 

On average, there is a sharp decrease in the num- 
ber of side chain conformations at compact regions 
where main chain Rg values are small for polymers 
of chain length N ~ 20. There is no plateau at 
compact regions with small Rg value, which would 
be characteristic of the jigsaw puzzle model. Our 
study using chiral model of homopolymers with two 
side chain atoms therefore is consistent with the nuts 
and bolts model of protein packing. 

o. Rotamer counting. Estimating side chain 
entropy is an important problem that received much 
attentioniSiSfli^Si^l. For example, it was proposed 
in refi^ that side-chain entropy should be used as 
a criterion alternative to packing density to assess 
protein packing. Models developed in this study 
allow us to calculate explicitly side-chain entropy. 
We compare the numbers of side chain conforma- 
tions obtained by exact calculation and by esti- 
mation using rotamer counting. With sequential 
Monte Carlo, we can access polymers in the full 
range of main chain compactness, including both 
maximum compact backbones and fully extended 
backbones, as well as polymers with compactness 
in-between. Because each sampled conformation is 
properly weighted, we have thus an accurate picture 
of the full distribution of all feasible geometric con- 
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FIG. 13: Side chain entropy and main chain compactness of 
tetrahedral chiral conformations with two-atom side chains, 
where the second atom can have three allowed positions, 
(a) Exact number of side chain conformations for all main 
chain structures of length 10 over the main chain radius of 
gyration {Rg), (b) Exact number of side chain conforma- 
tions for a set of 1,000 samples of main chain structures 
of length 20 over Rg, c) Expected side chain entropy of 
main chain structures of length 30 at different radius gy- 
ration, as estimated from exact calculation of side chain 
entropy based on 1,000,000 properly weighted samples of 
main chain structures (EC, solid line). Resampling tech- 
nique described ir^^ was used to obtained samples in each 
intervals of _Rg values. For comparison, side chain entropies 
estimated by rotamer counting are also plotted (RC, dot- 
ted line), and (d) Difference in expected side chain entropy 
by exact calculation and rotamer counting from sampled 
backbone structures at length 20 (solid line) and 30 (dot- 
ted line). 



formations for various side chain models. This is dif- 
ferent from other approaches such as molecular dy- 
namics, where one typically samples conformations 
around the native structurc'^^. 

We find that the number of side chain conforma- 
tion by rotamer counting is consistently higher than 
the number obtained from exact enumeration (Fig- 
ure EJ; andll3lF). The difference between these two 
methods varies at different main chain compactness. 
Over-estimation by rotamer counting is especially 
large for very extended conformations. It is also pro- 
nounced near the maximum compact region. That 
is, there is substantial unaccounted effect of side 
chain correlation in reducing side chain entropy due 
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FIG. 14: Side chain entropy and compactness constratint 
favor the formation of helix-like structures. An example of 
main chain structure of length 10 with maximum side chain 
entropy is shown here. It adopts a helix-like conformation. 



to excluded volume for rotamcr counting , and this 
effect is more pronounced in both extended and near 
maximum compact regions. 

p. Side chain entropy, compactness, and sec- 
ondary structures. How does side chain entropy af- 
fect the formation of secondary structures? An ex- 
ample of a conformation at length A'^ = 10 from 
tetrahedral chiral model (two side chain atoms 
with three possible position for the second atom) 
with maximum number of possible side chain con- 
formations is shown in Figure El Since the second 
side chain atom in M3 model can have three pos- 
sible sites, side chain entropies of different residues 
may be correlated due to excluded volume effect if 
their side chain atoms can reach the same lattice 
site. The backbone structure of this particular con- 
formation is arranged in such a way that none of the 
second atoms from different side chains can occupy 
the same lattice site. That is, in the conflict graph 
of this backbone, all vertices representing individual 
residues are disconnected, and there are N — 8 inde- 
pendent components in the graph. There is no cor- 
relation between side chain entropies of any residues 
in this backbone structure, and the total side chain 
entropy is simply determined by the total number 
of states of side chain Y[f=i "-jj where ni — 3 for M3 
model. It is remarkable that the spatial arrangement 
of this backbone structure resembles that of a helix. 
This suggests that the formation of helical secondary 
structures is strongly favored by side chain entropy 
for compact conforamtions. 

In contrast, the most extended backbone has 
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FIG. 15: Distributions of mean helix content for backbone 
conformations of A/3 model with maximum side chain en- 
tropy at different compactness for chains of length N = 11 
and TV = 12 as measured by radius of gyration Rg. We use 
9 bins of uniform width for backbone conformations whose 
Rg values are all within the range of 1.8 to 2.4 lattice unit 
length. Compact backbone conformations have higher helix 
content {e.g., the 0-th and the 1-st bins). 



much smaller side chain entropy (Figure |SJ). Al- 
though the second atom of all side chain can have 
three possible positions, an empty site reachable 
from two residues can be taken by the side chain 
of only one residue, and the total number of possi- 
ble states for side chains is much smaller. The con- 
flict graph of the backbone structure in Figure [3 has 
only two independent disconnected components, one 
formed by residues whose side chains are pointing 
up, and another formed by those whose side chains 
are pointing down. We found that the mean helix 
content for backbones with maximum side chain en- 
tropy increases rapidly as backbones become more 
compact. Intra-chain hydrogen bond has long been 
thought as the determining factor for the formation 
of helical secondary structures. Side chain entropy 
was shown as an opposing factor for helix forma- 
tion by molecular dynamics simulation22i. Results 
obtained here show that the combination of side 
chain entropy and compactness constraint lead to 
preference of helix formation (Figure ^J. Helices 
have been observed in polyproline molecules as small 
as three to five residues on the basis of vibrational 
and ultraviolet CD measurements^. Polyproline 
molecules cannot form intrachain hydrogen bond. 
Experimental results on polypeptoids also suggest 



13 



that side chain entropy can lead to formation of sig- 
nificant helical structures^. For example, artificially 
synthesized polypeptoids lack amide protons and are 
incapable of forming intra-chain hydrogen bonds. 
However, they can form monomeric alpha helices, 
as evidenced by CD spectra studies of pentameric 
and octameric peptoids. These alpha helices dis- 
play characteristics of peptide behavior such as co- 
operative pH- and temperature-induced unfolding in 
aqueous solution. Examination of the structural de- 
tails of these artificially synthesized polypeptoids in- 
dicates that side chain steric interaction in extended 
conformations of backbone is effectively avoided in 
helical conformations, as shown in our model study. 
Excluded volume effect of side chains leads to pref- 
erence of helical backbone conformations over ex- 
tended backbone conformations. This consideration 
may be useful for rational design of foldable poly- 
mer. 



IV. DISCUSSION 

In this study, we have developed two and three 
dimensional lattice models with explicit side chains. 
We make the distinction between the effects due to 
side chain chirality and to side chain flexibility. Side 
chains do not readily convert between configurations 
of different chiralities, whereas flexible side chains 
with two or more atoms can easily take different ro- 
tameric states when spatially feasible. We examine 
specifically the effects of side chain chirality and side 
chain flexibility on the distribution of polymers at 
different compactness, and their effects on folding 
entropy. 

We find that polymers from chiral models on aver- 
age are more compact than those from achiral mod- 
els. Chiral models also have significantly smaller 
folding entropy into compact conformations than 
achiral models. The excess folding entropy between 
achiral models and chiral models increase linearly 
with chain length for long chains, suggesting the ef- 
fects of chirality becomes more important for long 
chain polymers. 

We also find that models with less flexible side 
chains have lower entropy of folding than those with 
more flexible side chains. Polymers with more flexi- 
ble side chains may be thought to have better chance 
to fit into a compact state. However, there is a large 
entropic cost associated with flexible side chains. 
The excess entropy of flexible over inflexible side 
chain models also increases with chain length. These 
findings suggest that amino acid residues in proteins 



need to maintain a reduced flexibility to ensure fast 
folding and stable native structure. 

With explicit side chains, our study also confirms 
the conclusion of an earlier study based on sim- 
pler side chain models, namely, side chain packing is 
more like nuts-and-bolts rather than jigsaw puzzle, 
and main chain and side chain degrees of freedom 
are linked. 

It is informative to examine the side chain flex- 
ibility of natural amino acid residues. Among the 
20 amino acids, all non-polar amino acid residues ei- 
ther have branched side chains, or are aromatic with 
ring structures. On average they are rather inflexi- 
ble. The total number of rotatable bonds divided 
by the number of side chain atoms is small (Ta- 
ble^. In contrast, side chains of polar or ionizable 
residues such as lysine and arginine have more ro- 
tatable bonds and have higher flexibilities. However, 
this difference can be rationalized by the observation 
that side chains of polar and ionizable residues of- 
ten are involved in electrostatic ion pair interactions 
or hydrogen bonding interactions when buried in 
protein interior, hence they have effectively reduced 
flexibility. The overall flexibilities of side chains of all 
natural residues are therefore relatively small. This 
reduced flexibility may be necessary to decrease the 
entropy opposing folding to compact state. 

Examination of patterns of side chain rotamer li- 
braries further confirms this observation. We use a 
parameter / for the number of rotamers per atom 
defined as / = n^/na, where Ur is the number of 
all possible rotamers for a specific side chain type 
and Ua is the number of heavy atoms in that side 
chain type. By the criterion of hydrophobicitj*^, we 
divide twenty amino acids into three categories: hy- 
drophilic residues (hydrophobicity < 0.3), hydropho- 
bic residues (hydrophobicity > 0.75), and neutral 
residues (0.3 > hydrophobicity < 0.75). According 
to this division, there are seven hydrophobic residue 
types, seven hydrophilic residue types, and six neu- 
tral residue types. 

We calculate the weighted expected number of ro- 
tamers per atom / for each residue group, where 
the weighting factor is taken as the frequency of oc- 
currence of the specific amino acid residue type in 
eukaryotic proteins (see ref.^^). The number of pos- 
sible rotamers for each residue type is taken from 
referenc6i^. The expected / values are 1.10, 1.34, 
and 2.74 for hydrophobic, neutral, and polar amino 
acid residues, respectively. Polar residues have the 
largest / value, but they are frequently involved 
in electrostatic interactions and hydrogen bonding, 
which significantly decreases the actual flexibility of 
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TABLE II: Side chain flexibility of naturally occurring amino acid residues. H: hydrophobic residues, P: hydrophilic or 
polar residues, N: neutral residues, / = Nr/Na'- number of rotamers per atom. Values in parenthesis after residue names 
are hydrophobicity values of amino acid residues as given ir.'*^, and values in parenthesis after / — Nr/Na values are the 
relative frequency of occurrence of the corresponding amino acid residue in percentage*^.. 



H Nr/Na 


N Nr/Na 


P Nr/Na 


F(l.O) 0.57(4.29) 
1(0.94) 1.75(5.80) 
L(0.94) 1.25(9.40) 
V(0.83) 1.00(5.99) 
W(0.87) 0.70(1.11) 
Y(0.88) 0.50(3.15) 


A(0.62) 1.00(6.24) 
C(0.68) 1.50(1.70) 
G(0.50) -(5.64) 
M(0.74) 3.25(2.21) 
P(0.71) 1.00(4.96) 
S(0.36) 1.50(8.67) 
T(0.45) 1.00(5.60) 


D(0.028) 1.25(5.38) 
E(0.043) 1.60(6.57) 
H(0.165) 1.33(2.33) 
K(0.283) 5.40(6.46) 
N(0.236) 1.75(5.08) 
Q(0.251) 1.80(4.23) 
R(O.OOO) 4.86(4.99) 


Average 1.10(29.74) 


1.34(35.02) 


2.74(35.04) 



polar side chains. In general, / values for natural 
amino acid residues are small, indicating that by the 
criterion of weighted number of rotameric states per 
side chain atom, they are rather inflexible. 

It is remarkable that helix emerges as preferred 
main chain structure for compact main chain con- 
formations with maximum side chain entropy. Our 
results indicate that the correlation between side 
chains plays significant role in protein entropy and 
should be modeled more accurately. Real proteins 
have far more complex side chains. For example, 
to model a Lys residue realistically, a model of side 
chain of size 5 with all connecting flexibile bonds is 
needed. The associated side chain conformational 
space is much larger than the model developed 
in this study, and therefore is not amenable to de- 
tailed analysis. However, we believe the conclusion 
obtained using M3 model that inflexible side chain 
reduces folding entropy remains valid if longer and 
more flexible side chain model is used. In real pro- 
teins, there are many residues whose side chains have 
flexibility comparable to that of M3 model {e.g., His, 
Phe, Tyr, Val, Ser, Cys, if we regard the inflexible 
part of their side chains as one side chain bead in 
the M3 model) . For these residues with reduced side 
chain flexibility, we find that side chain entropy pro- 
motes the formation of helix for compact main chain 
conformations. 

Estimating side chain entropy is an important 
and difficult task for modeling protein structure and 
protein stability. With explicit side chain models 
on three dimensional tetrahedral lattice, we have 
developed an algorithm that calculates the exact 
side chain entropy of tetrahedral lattice models for 
any given main chain structures of moderate length. 



With current implementation, it works well up to 
chain length of 30. We compare results of side chain 
entropy calculated by rotamer counting and by the 
exact method developed here. For longer chain poly- 
mers {e.g., N — 30), we found rotamer counting 
method can give significantly over-estimated side 
chain entropy. For example, the average differ- 
ence between the two methods for models of length 
A'^ = 30 is larger for extended main chain structures 
{Rg > 6.5) and near compact main chain structures 
{Rg = 2.7 - 2.9, Figureinii)- 

The method for exact calculation of side chain en- 
tropy given a backbone structure can be generalized. 
For longer chains, each disconnected component in 
the conflict graph could contain too many residues 
such that enumeration becomes infeasible. It is pos- 
sible to further develop an algorithm using the same 
divide-and-conquer approach, where the large inde- 
pendent component is decomposed further into two 
roughly equal size disconnected components by re- 
moving a small number of edges in the conflict graph. 
As an illustration, the larger independent compo- 
nent in Figure formed by monomer 1, 3, 5, and 7 
can be decomposed into two small disconnected com- 
ponents by cutting the edge between vertices 4 and 
6, which corresponds to the shared position (labeled 
as "a"). The two smaller components can then be 
enumerated separately, and side chains of residues 
connected by the cut edges can also be enumer- 
ated individually. These enumeration will provide 
an exact value for the total possible side chain ar- 
rangements of the original larger independent com- 
ponent. When a disconnected components contain a 
large number of residues, an optimal decomposition 
becomes difficult. This is related to the graph par- 
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tition problem. Although finding an optimal solu- 
tion to this problem is known to be an NP-complete 
problem^^, there are many effective approximation 
and heuristic algorithms that are applicable for ob- 
taining a good decomposition. 

Side chains in natural amino acid residues are chi- 
ral, and proteins are better characterized using chi- 
ral models. Chiral models developed in this study 
will be useful for exploration of other properties of 
proteins, where side chains play important roles. 
Achiral models introduced here may also be useful 
to study other polymers with transient chirality on 
backbone^, or branched polymers such as peptoids, 



in which the chirality on nitrogen atom is unstable 
and the side chain can easily convert between oppo- 
site configuration^i. 
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